Efficiently
updating products over multisets
About the notation.
Red bold font indicate a definition. The notation
"<>" is the "not equal" operator. The symbol
"·" is multiplication.
Multiset
operations.
A multiset is a set which can contain more
than one instance of each element in the set. The number of occurrences of an
element a in a set s is here called the member count of
a in s. Usually the member count is a natural number, that is, a
non-negative integer; here it can be any real number. The notation employed here is
non-standard, as is the concept of a multiset allowing real-numbered member counts.
|
Operation: |
Description &
definition: |
|
#( a,
s ) |
Member count of a in s.
Defining operation. |
|
Ø |
The empty set.
For all a: #( a, Ø ) = 0. |
|
{.a.} |
Singleton set.
#( a, {.b.} ) = (if a
= b then 1 else 0). |
|
n·s |
Member count scaling.
#( a, n·s ) = n·#(
a, s ). |
|
s1
(+) s2 |
Multiset union.
#( a, s1 (+) s2 ) =
#( a, s1 ) + #( a, s2 ). |
|
s1
(-) s2 |
Multiset difference.
#( a, s1 (-) s2 ) =
#( a, s1 ) - #( a, s2 ). |
Conceptually a number a is an element (or
member) of a set s iff the member count of e in s is
non-zero.
Product over a multiset.
Let P stand for
the usual product-of operator (uppercase pi, which is difficult to present here since
Microsoft FrontPage, my HTML editor, doesnīt support HTML math extensions). Then,
using the convention that 00 = 1, the product P( s )
over a multiset s can be defined as
P( s ) = Pa(a#(
a, s ))
P( s ) is undefined when #( 0, s ) < 0,
since the product then involves a negative power of 0.
By the definition above, P( s ) <> 0 when #(
0, s ) = 0; in particular, P( Ø ) = 1.
By the definition above, P( s ) = 0 when #( 0, s
) > 0.
Note that P distributes over multiset union, as it should:
P( s1 (+) s2 ) = Pa(a#(
a, s1 (+) s2 )) = Pa(a#(
a, s1 ) + #( a, s2 )) = Pa(a#(
a, s1 )·a#( a, s2 ))
= P( s1 )·P( s2 )
This is just a restatement of the simple in a less than
simple notation, but the property is crucial.
Narrowing: non-zero versus zero
values.
In order to efficiently update the product itīs useful to
differentiate between the zero and non-zero values. To do so itīs convenient to
define a narrowing operation narrow( s, a ),
which extracts the subset of a values from s, as
narrow( s, a
) = #( a, s )·{.a.}
The following is obvious, but as for the distributive
property of P itīs crucial:
#( a, s ) = #( a,
narrow( s, a ) )
The product of non-zero values, Pnz( s ),
and the product of zero values, Pz( s ), can be now defined as
respectively
Pz( s ) = P( narrow( s,
0 ) )
Pnz( s ) = P( s (-)
narrow( s, 0 ) )
and, since P distributes over multiset union, we have
Pnz( s )·Pz( s ) = P( s (-)
narrow( s, 0 ) (+) narrow( s, 0 ) ) = P( s )
Incremental updating of P( s ) is now reduced to
incremental updating of Pz( s ) and Pnz( s ). And Pz( s )
can be computed in constant time from the count of zeros #( 0, s ) = #( 0,
narrow( 0, s ) ):
Pz( s ) = (if #( 0, s ) = 0 then 1 else if
#( 0, s ) > 0 then 0).
If #( 0, s ) < 0 then Pz( s ) is
undefined.
Thus, only the values Pnz( s ) and #( 0, s
) need to be kept and updated in order to compute the product. The actual multiset
need not be represented.
An abstract data type.
To represent the Pnz( s ) and #( 0, s )
values the following abstract data type r (just a representative pair of
numbers) is convenient:
r( a, b )p
= a
r( a, b )z
= b
r( s ) = r( Pnz(
s ), #( 0, s ) )
Pnz( s ) and Pz( s ), and thus P( s
), are trivially expressed in terms of the setīs representative pair:
Pnz( s ) = r( s )p
Pz( s ) = (if r( s )z
= 0 then 1 else if r( s )z > 0 then 0)
P( s ) = Pnz( s )·Pz( s ) = (if r( s
)z = 0 then r( s )p else if r( s )z
> 0 then 0)
And the representative pair is equally trivially updated when
a value is added to or removed from the set:
r( s (+) {.a.} ) = (if a
= 0 then r( r( s )p, r( s )z + 1 ) else r( a·r(
s )p, r( s )z ))
r( s (-) {.a.} ) = (if a
= 0 then r( r( s )p, r( s )z - 1 ) else r( r( s
)p/a, r( s )z ))
These update operations are all thatīs needed, but while
efficient (you canīt do better than constant time) itīs not exactly elegant...
A generalization.
The singleton update operations above are special cases of
the general set update operations
r( s1 (+) s2 ) = r( r( s1
)p·r( s2 )p, r( s1 )z + r( s2
)z )
r( s1 (-) s2 ) = r( r( s1
)p/r( s2 )p, r( s1 )z - r( s2
)z )
These more general but not immediately useful operations
suggest defining the following operations on representative pairs:
r1 Ũ r2
= r( r1p·r2p, r1z + r2z
)
r1 ũ r2
= r( r1p/r2p, r1z
- r2z )
(Note that extreme care must be exercised if these operations
are implemented in terms of complex arithmetic operations, since thereīs no periodicity
here.)
The singleton update operations now become simply
r( s (+) {.a.} ) = r( s
) Ũ r( {.a.} )
r( s (-) {.a.} ) = r( s
) ũ r( {.a.} )
where, by the earlier definitions,
r( {.a.} ) = r( Pnz( {.a.} ),
#( 0, {.a.} ) ) = (if a = 0 then r( 1, 1 ) else r( a,
0 ))
which suggests defining
r( a ) = r( {.a.}
) = (if a = 0 then r( 1, 1 ) else r( a, 0 ))
P( r ) = (if rz
= 0 then rp else if rz > 0 then 0)
corresponding to an identification of the singleton set {.a.}
with the value a. With this identification and the direct definition of P
all calculations can be done in terms of numeric values and representative pairs, where
the multiset is only implied.
Multiset product versus
numeric product.
Multiset products are isomorphic to ordinary numeric
products:
P( r( a ) Ũ r( b ) ) = P( r(
{.a.} (+) {.b.} ) ) = a·b
P( r( a ) ũ r( b ) ) = P( r(
{.a.} (-) {.b.} ) ) = a/b
when
not( a = 0 and b = 0 )
Hence the numeric product laws for commutativity and
associativity hold also for multiset representative pairs.
However, r( 0 ) is not a zero of multiset representative
pairs, and P( r( 0 ) ũ r( 0 ) ) = 1 instead of undefined (or in computer programming, a
NaN).
|